A Load-Aware Data Placement Policy on Cluster File System

نویسندگان

Yu Wang

Jing Xing

Jin Xiong

Dan Meng

چکیده

In a large-scale cluster system with many applications running on it, cluster-wide I/O access workload disparity and disk saturation on only some storage servers have been the severe performance bottleneck that deteriorates the system I/O performance. As a result, the system response time will increase and the throughput of the system will decrease drastically. In this paper, we present a load-aware data placement policy that will distribute data across the storage servers based on the load of each server and automatically migrate data from heavily-loaded servers to lightly-loaded servers. This policy is adaptive and self-managing. It operates without any prior knowledge of application access workload characteristics or the capabilities of storage servers. It can make full use of the aggregate disk bandwidth of all storage servers efficiently. Performance evaluation shows that our policy will improve the aggregate I/O bandwidth by 10%-20% compared with random data placement policy especially under mixed workloads.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large datasets across cluster of workstations. To handle massive scale data, Hadoop exploits the Hadoop Distributed File System termed as HDFS. The HDFS similar to most distributed file systems sh...

متن کامل

Performance Improvement of Map Reduce through Enhancement in Hadoop Block Placement Algorithm

In last few years, a huge volume of data has been produced from multiple sources across the globe. Dealing with such a huge volume of data has arisen the so called “Big data problem”, which can be solved only with new computing paradigms and platforms which lead to Apache Hadoop to come into picture. Inspired by the Google’s private cluster platform, few independent software developers develope...

متن کامل

Sorrento: A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications

This paper describes the design and implementation of Sorrento – a self-organizing storage cluster built upon commodity components. Sorrento complements previous researches on distributed file/storage systems by focusing on incremental expandability and manageability of the system and on design choices for optimizing performance of parallel data-intensive applications with low write-sharing pat...

متن کامل

A Cyber-Physical, Data-Centric Cooling Energy Costs Reduction Approach for Big Data Analytics Cloud

Big Data explosion and surge in large-scale Big Data analytics cloud infrastructure have led to burgeoning energy costs and present a challenge to the existing run-time cooling energy management techniques. T ∗GreenHDFS, a thermalaware cloud file system, takes a novel, data-centric approach to reduce cooling energy costs. On the physicalside, T ∗GreenHDFS is cognizant of the uneven thermalprofi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

A Load-Aware Data Placement Policy on Cluster File System

نویسندگان

چکیده

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Performance Improvement of Map Reduce through Enhancement in Hadoop Block Placement Algorithm

Sorrento: A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications

A Cyber-Physical, Data-Centric Cooling Energy Costs Reduction Approach for Big Data Analytics Cloud

عنوان ژورنال:

اشتراک گذاری